direct attack
Jailbreaking Large Language Diffusion Models: Revealing Hidden Safety Flaws in Diffusion-Based Text Generation
Zhang, Yuanhe, Xie, Fangzhou, Zhou, Zhenhong, Li, Zherui, Chen, Hao, Wang, Kun, Guo, Yufei
Large Language Diffusion Models (LLDMs) exhibit comparable performance to LLMs while offering distinct advantages in inference speed and mathematical reasoning tasks.The precise and rapid generation capabilities of LLDMs amplify concerns of harmful generations, while existing jailbreak methodologies designed for Large Language Models (LLMs) prove limited effectiveness against LLDMs and fail to expose safety vulnerabilities.Successful defense cannot definitively resolve harmful generation concerns, as it remains unclear whether LLDMs possess safety robustness or existing attacks are incompatible with diffusion-based architectures.To address this, we first reveal the vulnerability of LLDMs to jailbreak and demonstrate that attack failure in LLDMs stems from fundamental architectural differences.We present a PArallel Decoding jailbreak (PAD) for diffusion-based language models. PAD introduces Multi-Point Attention Attack, which guides parallel generative processes toward harmful outputs that inspired by affirmative response patterns in LLMs. Experimental evaluations across four LLDMs demonstrate that PAD achieves jailbreak attack success rates by 97%, revealing significant safety vulnerabilities. Furthermore, compared to autoregressive LLMs of the same size, LLDMs increase the harmful generation speed by 2x, significantly highlighting risks of uncontrolled misuse.Through comprehensive analysis, we provide an investigation into LLDM architecture, offering critical insights for the secure deployment of diffusion-based language models.
Behind the Tip of Efficiency: Uncovering the Submerged Threats of Jailbreak Attacks in Small Language Models
Yi, Sibo, Cong, Tianshuo, He, Xinlei, Li, Qi, Song, Jiaxing
Small language models (SLMs) have become increasingly prominent in the deployment on edge devices due to their high efficiency and low computational cost. While researchers continue to advance the capabilities of SLMs through innovative training strategies and model compression techniques, the security risks of SLMs have received considerably less attention compared to large language models (LLMs).To fill this gap, we provide a comprehensive empirical study to evaluate the security performance of 13 state-of-the-art SLMs under various jailbreak attacks. Our experiments demonstrate that most SLMs are quite susceptible to existing jailbreak attacks, while some of them are even vulnerable to direct harmful prompts.To address the safety concerns, we evaluate several representative defense methods and demonstrate their effectiveness in enhancing the security of SLMs. We further analyze the potential security degradation caused by different SLM techniques including architecture compression, quantization, knowledge distillation, and so on. We expect that our research can highlight the security challenges of SLMs and provide valuable insights to future work in developing more robust and secure SLMs.
Investigating Application of Deep Neural Networks in Intrusion Detection System Design
Despite decades of development, existing IDSs still face challenges in improving detection accuracy, evasion, and detection of unknown attacks. To solve these problems, many researchers have focused on designing and developing IDSs that use Deep Neural Networks (DNN) which provides advanced methods of threat investigation and detection. Given this reason, the motivation of this research then, is to learn how effective applications of Deep Neural Networks (DNN) can accurately detect and identify malicious network intrusion, while advancing the frontiers of their optimal potential use in network intrusion detection. Using the ASNM-TUN dataset, the study used a Multilayer Perceptron modeling approach in Deep Neural Network to identify network intrusions, in addition to distinguishing them in terms of legitimate network traffic, direct network attacks, and obfuscated network attacks. To further enhance the speed and efficiency of this DNN solution, a thorough feature selection technique called Forward Feature Selection (FFS), which resulted in a significant reduction in the feature subset, was implemented. Using the Multilayer Perceptron model, test results demonstrate no support for the model to accurately and correctly distinguish the classification of network intrusion.
- North America > United States > North Dakota (0.04)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- Europe > Czechia > South Moravian Region > Brno (0.04)
Self-Evaluation as a Defense Against Adversarial Attacks on LLMs
Brown, Hannah, Lin, Leon, Kawaguchi, Kenji, Shieh, Michael
When LLMs are deployed in sensitive, human-facing settings, it is crucial that they do not output unsafe, biased, or privacy-violating outputs. For this reason, models are both trained and instructed to refuse to answer unsafe prompts such as "Tell me how to build a bomb." We find that, despite these safeguards, it is possible to break model defenses simply by appending a space to the end of a model's input. In a study of eight open-source models, we demonstrate that this acts as a strong enough attack to cause the majority of models to generate harmful outputs with very high success rates. We examine the causes of this behavior, finding that the contexts in which single spaces occur in tokenized training data encourage models to generate lists when prompted, overriding training signals to refuse to answer unsafe requests. Our findings underscore the fragile state of current model alignment and promote the importance of developing more robust alignment methods. Code and data will be made available at https://github.com/Linlt-leon/self-eval.
- Asia > Singapore (0.04)
- North America > United States > Texas (0.04)
- Asia > Middle East > Jordan (0.04)
- (2 more...)
Is LLM-as-a-Judge Robust? Investigating Universal Adversarial Attacks on Zero-shot LLM Assessment
Raina, Vyas, Liusie, Adian, Gales, Mark
Large Language Models (LLMs) are powerful zero-shot assessors used in real-world situations such as assessing written exams and benchmarking systems. Despite these critical applications, no existing work has analyzed the vulnerability of judge-LLMs to adversarial manipulation. This work presents the first study on the adversarial robustness of assessment LLMs, where we demonstrate that short universal adversarial phrases can be concatenated to deceive judge LLMs to predict inflated scores. Since adversaries may not know or have access to the judge-LLMs, we propose a simple surrogate attack where a surrogate model is first attacked, and the learned attack phrase then transferred to unknown judge-LLMs. We propose a practical algorithm to determine the short universal attack phrases and demonstrate that when transferred to unseen models, scores can be drastically inflated such that irrespective of the assessed text, maximum scores are predicted. It is found that judge-LLMs are significantly more susceptible to these adversarial attacks when used for absolute scoring, as opposed to comparative assessment. Our findings raise concerns on the reliability of LLM-as-a-judge methods, and emphasize the importance of addressing vulnerabilities in LLM assessment methods before deployment in high-stakes real-world scenarios.
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
- North America > United States > Michigan (0.04)
- Asia > Singapore (0.04)
- (3 more...)
- Information Technology > Security & Privacy (0.72)
- Government > Military (0.72)
Sen. Fetterman breaks with President Biden on US response to Iran attacks: 'We should have Israel's back'
Sen. Fetterman said he disagreed with President Biden's decision to keep the U.S. out of any offensive response to the Iran attacks on Sunday during an interview with CNN. Sen. John Fetterman, D-Pa., said he didn't agree with President Biden on his stance that the U.S. wouldn't join in an offensive operation against Iran during an interview on Sunday, saying he would never "capitulate to the fringe" of his party. CNN host Jake Tapper asked Fetterman to respond to reports that Biden told Israeli Prime Minister Benjamin Netanyahu that the U.S. wouldn't participate in any offensive operations against Iran during a conversation on Saturday. "Do you think that's the right call or should direct U.S. military action, as some of your colleagues in the Senate are suggesting, should that be on the table?" he asked. "I don't agree with that, I just think we should follow and have Israel's back in the situation. I don't agree with the president. I'm proud to stand with him and campaign for him and vote for him," he responded.
- Asia > Middle East > Iran (1.00)
- North America > United States > Pennsylvania (0.07)
- North America > United States > North Carolina > Wake County > Raleigh (0.05)
- (2 more...)
TARGET: Template-Transferable Backdoor Attack Against Prompt-based NLP Models via GPT4
Tan, Zihao, Chen, Qingliang, Huang, Yongjian, Liang, Chen
Prompt-based learning has been widely applied in many low-resource NLP tasks such as few-shot scenarios. However, this paradigm has been shown to be vulnerable to backdoor attacks. Most of the existing attack methods focus on inserting manually predefined templates as triggers in the pre-training phase to train the victim model and utilize the same triggers in the downstream task to perform inference, which tends to ignore the transferability and stealthiness of the templates. In this work, we propose a novel approach of TARGET (Template-trAnsfeRable backdoor attack aGainst prompt-basEd NLP models via GPT4), which is a data-independent attack method. Specifically, we first utilize GPT4 to reformulate manual templates to generate tone-strong and normal templates, and the former are injected into the model as a backdoor trigger in the pre-training phase. Then, we not only directly employ the above templates in the downstream task, but also use GPT4 to generate templates with similar tone to the above templates to carry out transferable attacks. Finally we have conducted extensive experiments on five NLP datasets and three BERT series models, with experimental results justifying that our TARGET method has better attack performance and stealthiness compared to the two-external baseline methods on direct attacks, and in addition achieves satisfactory attack capability in the unseen tone-similar templates.
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- Europe > Ireland > Leinster > County Dublin > Dublin (0.04)
- Asia > China > Guangdong Province > Guangzhou (0.04)
- (8 more...)
Adversarial Attacks on Neural Networks for Graph Data
Zügner, Daniel, Akbarnejad, Amir, Günnemann, Stephan
Deep learning models for graphs have achieved strong performance for the task of node classification. Despite their proliferation, currently there is no study of their robustness to adversarial attacks. Yet, in domains where they are likely to be used, e.g. the web, adversaries are common. Can deep learning models for graphs be easily fooled? In this work, we introduce the first study of adversarial attacks on attributed graphs, specifically focusing on models exploiting ideas of graph convolutions. In addition to attacks at test time, we tackle the more challenging class of poisoning/causative attacks, which focus on the training phase of a machine learning model. We generate adversarial perturbations targeting the node's features and the graph structure, thus, taking the dependencies between instances in account. Moreover, we ensure that the perturbations remain unnoticeable by preserving important data characteristics. To cope with the underlying discrete domain we propose an efficient algorithm Nettack exploiting incremental computations. Our experimental study shows that accuracy of node classification significantly drops even when performing only few perturbations. Even more, our attacks are transferable: the learned attacks generalize to other state-of-the-art node classification models and unsupervised approaches, and likewise are successful even when only limited knowledge about the graph is given.
- Europe > United Kingdom > England > Greater London > London (0.05)
- Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)
- Research Report > Experimental Study (0.66)
- Research Report > New Finding (0.48)
- Information Technology > Security & Privacy (1.00)
- Government (1.00)